{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3.3 多元线性回归"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3.3.1 多元线性回归的数学原理和代码实现"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.案例背景"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里以信用卡客户的客户价值来解释下客户价值预测的具体含义：客户价值预测就是指客户未来一段时间能带来多少利润，其利润的来源可能来自于信用卡的年费、取现手续费、分期手续费、境外交易手续费用等。而分析出客户的价值后，在进行营销、电话接听、催收、产品咨询等各项服务时，就可以针对高价值的客户进行区别于普通客户的服务，有助于进一步挖掘这些高价值客户的价值，并提高这些高价值客户的忠诚度。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2.读取数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>客户价值</th>\n",
       "      <th>历史贷款金额</th>\n",
       "      <th>贷款次数</th>\n",
       "      <th>学历</th>\n",
       "      <th>月收入</th>\n",
       "      <th>性别</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1150</td>\n",
       "      <td>6488</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>9567</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1157</td>\n",
       "      <td>5194</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>10767</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1163</td>\n",
       "      <td>7066</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>9317</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>983</td>\n",
       "      <td>3550</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>10517</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1205</td>\n",
       "      <td>7847</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>11267</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   客户价值  历史贷款金额  贷款次数  学历    月收入  性别\n",
       "0  1150    6488     2   2   9567   1\n",
       "1  1157    5194     4   2  10767   0\n",
       "2  1163    7066     3   2   9317   0\n",
       "3   983    3550     3   2  10517   0\n",
       "4  1205    7847     3   3  11267   1"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_excel('客户价值数据表.xlsx')\n",
    "df.head()  # 显示前5行数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = df[['历史贷款金额', '贷款次数', '学历', '月收入', '性别']]\n",
    "Y = df['客户价值']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3.模型搭建"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.linear_model import LinearRegression\n",
    "regr = LinearRegression()\n",
    "regr.fit(X,Y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4.线性回归方程构造"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([5.71421731e-02, 9.61723492e+01, 1.13452022e+02, 5.61326459e-02,\n",
       "       1.97874093e+00])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "regr.coef_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "各系数为:[5.71421731e-02 9.61723492e+01 1.13452022e+02 5.61326459e-02\n",
      " 1.97874093e+00]\n",
      "常数项系数k0为:-208.42004079958383\n"
     ]
    }
   ],
   "source": [
    "print('各系数为:' + str(regr.coef_))\n",
    "print('常数项系数k0为:' + str(regr.intercept_))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "其中这里通过regr.coef_获得是一个系数列表，分别对应不同特征变量前面的系数，也即k1、k2、k3、k4及k5，所以此时的多元线性回归曲线方程为：\n",
    "\n",
    "y = -208 + 0.057*x1 + 96*x2 + 113*x3 + 0.056*x4 + 1.97*x5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "5.模型评估"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "D:\\Anaconda\\Anaconda\\lib\\site-packages\\numpy\\core\\fromnumeric.py:2495: FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.\n",
      "  return ptp(axis=axis, out=out, **kwargs)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>OLS Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>          <td>客户价值</td>       <th>  R-squared:         </th> <td>   0.571</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>                   <td>OLS</td>       <th>  Adj. R-squared:    </th> <td>   0.553</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>             <td>Least Squares</td>  <th>  F-statistic:       </th> <td>   32.44</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>             <td>Wed, 01 Jan 2020</td> <th>  Prob (F-statistic):</th> <td>6.41e-21</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>                 <td>19:43:51</td>     <th>  Log-Likelihood:    </th> <td> -843.50</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>No. Observations:</th>      <td>   128</td>      <th>  AIC:               </th> <td>   1699.</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Df Residuals:</th>          <td>   122</td>      <th>  BIC:               </th> <td>   1716.</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Df Model:</th>              <td>     5</td>      <th>                     </th>     <td> </td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Covariance Type:</th>      <td>nonrobust</td>    <th>                     </th>     <td> </td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "     <td></td>       <th>coef</th>     <th>std err</th>      <th>t</th>      <th>P>|t|</th>  <th>[0.025</th>    <th>0.975]</th>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>const</th>  <td> -208.4200</td> <td>  163.810</td> <td>   -1.272</td> <td> 0.206</td> <td> -532.699</td> <td>  115.859</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>历史贷款金额</th> <td>    0.0571</td> <td>    0.010</td> <td>    5.945</td> <td> 0.000</td> <td>    0.038</td> <td>    0.076</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>贷款次数</th>   <td>   96.1723</td> <td>   25.962</td> <td>    3.704</td> <td> 0.000</td> <td>   44.778</td> <td>  147.567</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>学历</th>     <td>  113.4520</td> <td>   37.909</td> <td>    2.993</td> <td> 0.003</td> <td>   38.406</td> <td>  188.498</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>月收入</th>    <td>    0.0561</td> <td>    0.019</td> <td>    2.941</td> <td> 0.004</td> <td>    0.018</td> <td>    0.094</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>性别</th>     <td>    1.9787</td> <td>   32.286</td> <td>    0.061</td> <td> 0.951</td> <td>  -61.934</td> <td>   65.891</td>\n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "  <th>Omnibus:</th>       <td> 1.597</td> <th>  Durbin-Watson:     </th> <td>   2.155</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Prob(Omnibus):</th> <td> 0.450</td> <th>  Jarque-Bera (JB):  </th> <td>   1.538</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Skew:</th>          <td> 0.264</td> <th>  Prob(JB):          </th> <td>   0.464</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Kurtosis:</th>      <td> 2.900</td> <th>  Cond. No.          </th> <td>1.28e+05</td>\n",
       "</tr>\n",
       "</table><br/><br/>Warnings:<br/>[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.<br/>[2] The condition number is large, 1.28e+05. This might indicate that there are<br/>strong multicollinearity or other numerical problems."
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                            OLS Regression Results                            \n",
       "==============================================================================\n",
       "Dep. Variable:                   客户价值   R-squared:                       0.571\n",
       "Model:                            OLS   Adj. R-squared:                  0.553\n",
       "Method:                 Least Squares   F-statistic:                     32.44\n",
       "Date:                Wed, 01 Jan 2020   Prob (F-statistic):           6.41e-21\n",
       "Time:                        19:43:51   Log-Likelihood:                -843.50\n",
       "No. Observations:                 128   AIC:                             1699.\n",
       "Df Residuals:                     122   BIC:                             1716.\n",
       "Df Model:                           5                                         \n",
       "Covariance Type:            nonrobust                                         \n",
       "==============================================================================\n",
       "                 coef    std err          t      P>|t|      [0.025      0.975]\n",
       "------------------------------------------------------------------------------\n",
       "const       -208.4200    163.810     -1.272      0.206    -532.699     115.859\n",
       "历史贷款金额         0.0571      0.010      5.945      0.000       0.038       0.076\n",
       "贷款次数          96.1723     25.962      3.704      0.000      44.778     147.567\n",
       "学历           113.4520     37.909      2.993      0.003      38.406     188.498\n",
       "月收入            0.0561      0.019      2.941      0.004       0.018       0.094\n",
       "性别             1.9787     32.286      0.061      0.951     -61.934      65.891\n",
       "==============================================================================\n",
       "Omnibus:                        1.597   Durbin-Watson:                   2.155\n",
       "Prob(Omnibus):                  0.450   Jarque-Bera (JB):                1.538\n",
       "Skew:                           0.264   Prob(JB):                        0.464\n",
       "Kurtosis:                       2.900   Cond. No.                     1.28e+05\n",
       "==============================================================================\n",
       "\n",
       "Warnings:\n",
       "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
       "[2] The condition number is large, 1.28e+05. This might indicate that there are\n",
       "strong multicollinearity or other numerical problems.\n",
       "\"\"\""
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import statsmodels.api as sm  # 引入线性回归模型评估相关库\n",
    "X2 = sm.add_constant(X)\n",
    "est = sm.OLS(Y, X2).fit()\n",
    "est.summary()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
